European Radiology
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match European Radiology's content profile, based on 14 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Chaves, E. T.; Teunis, J. T.; Digmayer Romero, V. H.; van Nistelrooij, N.; Vinayahalingam, S.; Sezen-Hulsmans, D.; Mendes, F. M.; Huysmans, M.-C.; Cenci, M. S.; Lima, G. d. S.
Show abstract
Background: Radiographic detection of caries lesions adjacent to restorations is challenging due to limitations of two-dimensional imaging and difficulties distinguishing true lesions from restorative or anatomical radiolucencies. Artificial intelligence (AI)-based clinical decision support systems (CDSSs) have been introduced to assist radiographic interpretation; however, different AI tools may yield variable diagnostic outputs, and their comparative performance remains unclear. Objective: To compare the diagnostic performance of commercial and experimental AI algorithms for detecting secondary caries lesions on bitewings. Methods: This cross-sectional diagnostic accuracy study included 200 anonymized bitewings comprising 885 restored tooth surfaces. A consensus group reference standard identified all surfaces with a caries lesion and classified each lesion by type (primary/secondary) and depth (enamel-only/dentin-involved). Five commercial (Second Opinion, CranioCatch, Diagnocat, DIO Inteligencia, and Align X-ray Insights) and three experimental (Mask R-CNN-based and Mask DINO-based) systems were tested. Diagnostic performance was expressed through sensitivity, specificity, and overall accuracy (95% CI). Comparisons used generalized estimating equations, adjusted for clustered data. Results: Specificity was high across all systems (0.957-0.986), confirming accurate recognition of non-carious surfaces, whereas sensitivity was moderate (0.327-0.487), reflecting frequent missed detections of enamel and dentin lesions. Accuracy ranged from 0.882 to 0.917, with no significant differences among models (p >= 0.05). Confounding factors, such as radiographic overlapping, marginal restoration defects, and cervical artifacts, were the main sources of misclassification. Conclusions: AI algorithms, regardless of architecture or commercial status, showed similar diagnostic capabilities and a conservative detection profile, favoring specificity over sensitivity. Improvements in dataset diversity, labeling precision, and explainability may further enhance reliability for secondary caries detection. Clinical Significance: AI-based CDSSs assist clinicians by providing consistent detection. Their high specificity is particularly valuable in minimizing unnecessary invasive treatments (overtreatment), though they should be used as adjuncts rather than a replacement for expert judgment.
de Boer, S.; Häntze, H.; Ziegelmayer, S.; van Ginneken, B.; Prokop, M.; Bressem, K. K.; Hering, A.
Show abstract
Background: Medical imaging, especially computed tomography and magnetic resonance imaging, is essential in clinical care of patients with renal cell carcinoma (RCC). Artificial intelligence (AI) research into computer-aided diagnosis, staging and treatment planning needs curated and annotated datasets. Across literature, The Cancer Genome Atlas (TCGA) datasets are widely used for model training and validation. However, re-annotation is often necessary due to limited access to public annotations, raising entry barriers and hindering comparison with prior work. Methods: We screened 1915 CT scans from three TCGA-RCC databases and employed a segmentation model to annotate kidney lesion. After a meta-data-based exclusion step, we hosted a reader study with all papillary (n=56), chromophobe (n=27) and 200 randomly selected clear cell RCC cases. Two students quality checked and corrected the data as well as annotated tumors and cysts. Uncertain cases were checked by a board-certified radiologist. Results: After data exclusion and quality control a total of 142 annotated CT scans from 101 patients (26 female, 75 male, mean age 56 years) remained. This includes 95 CTs with clear cell RCC, 29 with papillary RCC and 18 with chromophobe RCC. Images and voxel-level annotations of kidneys and lesions are open sourced at https://zenodo.org/records/19630298. Conclusion: By making the annotations open-source, we encourage accessible and reproducible AI research for renal cell carcinoma. We invite other researchers who have previously annotated any of these cohorts to share their annotations.
Authamayou, B.; Marnat, G.; Matsulevits, A.; Munsch, F.; Lavielle, A.; Courbin, N.; Foulon, C.; Chen, B.; Micard, E.; Gory, B.; L'Allinec, V.; Bourcier, R.; Naggara, O.; Lauze, E.; Boulouis, G.; Lapergue, B.; Eker, O.; Sibon, I. P.; Thiebaut de Schotten, M.; Tourdias, T.
Show abstract
BackgroundAcute basilar artery occlusion (BAO) causes devastating strokes. Despite the benefit of endovascular treatment, the optimal management remains sometimes controversial, such as for patients with mild deficits, and would benefit from robust prognostic tools. Given the dense white matter networks within the posterior fossa, we tested whether quantifying disconnections from acute diffusion-weighted imaging (DWI) could improve outcome prediction and responders to recanalization compared with conventional metrics. MethodsWe conducted a secondary analysis from a prospective multicenter stroke registry, including consecutive patients (2017-2024) with BAO and admission MRI. Ultra-high-resolution diffusion MRI was acquired in healthy participants to build normative tractograms with optimized posterior fossa quality. Patient infarcts delineated on DWI were projected onto these tractograms to estimate disconnected fiber volume. The primary outcome was 90-day modified Rankin Scale (mRS) 0-3 vs 4-6. Predictive performance of disconnected fiber volume was compared with baseline NIHSS, infarct volume, and posterior circulation ASPECTS (pc-ASPECTS) using logistic regressions and areas under receiver operating characteristic curves (AUC). Ordinal regressions tested associations across the full mRS spectrum, stratified by recanalization status. Analyses were repeated in patients with NIHSS [≤]10. ResultsAmong 201 patients (median age 70; NIHSS 10), 97 (48.3%) had poor outcome. Despite small median infarct volume (4.75 mL), disconnected fiber volume was substantial (median 25.15 mL). Disconnected fiber volume achieved an AUC of 0.84, outperforming NIHSS (0.67; p<0.0001), infarct volume (0.75; p=0.00059), and pc-ASPECTS (0.76; p=0.0127). Low disconnected fiber volume predicted better outcomes across the full mRS (OR=0.12 [95% CI, 0.065-0.204]) and greater benefit from successful recanalization (OR=0.33 [95% CI, 0.15-0.70]). In patients with NIHSS [≤]10 (n=102), disconnected fiber volume remained the strongest predictor (AUC=0.83). ConclusionsDisconnected fiber volume derived indirectly is a robust prognostic marker of BAO outcomes that outperforms conventional predictors and may support future treatment decisions. Registrationhttps://clinicaltrials.gov - NCT03776877.
Hofmeister, J.; Bernava, G.; Rosi, A.; Brina, O.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.
Show abstract
Background: Even for experienced operators, endovascular treatment of unruptured intracranial aneurysms involves intraoperative uncertainty that may lead to adjustments in strategy, prolong the procedure, and potentially cause inefficiency and device waste. This study aimed to evaluate whether pre-procedural testing (PPT) of endovascular treatment using patient-specific models was associated with increased operator confidence and perceived clinical utility, including improvements in procedural efficiency and reduced resource waste. Methods: We enrolled a cohort of patients who underwent PPT before endovascular treatment for complex unruptured intracranial aneurysms and compared their outcomes with a control group treated without PPT. The primary outcome was the Training Fidelity Score, a composite of three operator-reported Likert items defined a priori. Secondary outcomes included perceived clinical utility, intraoperative strategy changes, procedural time, radiation exposure, device waste and safety. Results: A total of 85 patients met the inclusion criteria (PPT=40; control=45). The Training Fidelity Score was high across the PPT group (median, 4.33/5). Perceived clinical utility was high and further increased significantly after the procedure. A significant reduction was observed in intraoperative strategy changes, with no changes recorded in the PPT group, compared to 6/45 in the control group (RR 0.09; p=0.027). Reductions in treatment time, radiation exposure and device waste were also noted. Conclusion: PPT using patient-specific models was associated with increased operator confidence, fewer intraoperative strategy changes, improved procedural efficiency, and reduced device waste without compromising safety. These findings support its use in pre-interventional preparation, but require prospective multicenter validation.
Dell'Orco, A.; De Vita, E.; D'Arco, F.; Lange, A.; Rüber, T.; Kaindl, A. M.; Wattjes, M. P.; Thomale, U. W.; Becker, L.-L.; Tietze, A.
Show abstract
Focal cortical dysplasias (FCDs) are one of the most common structural causes of drug-resistant epilepsy in children but are frequently subtle and difficult to detect on conventional MRI. Many automated lesion detection methods have therefore been proposed to support neuroradiological assessment. In this study, we externally validated two recently developed deep-learning approaches for FCD detection, MELD Graph and 3D-nnUNet, in a pediatric cohort. In this retrospective single-center study, brain MRI scans of 71 children evaluated for epilepsy were analyzed, including 35 MRI-positive patients with suspected FCD and 36 MRI-negative cases based on the primary radiology reports. Both models were applied to standard 3D T1-weighted and 3D FLAIR images. Detected lesions were reviewed by an experienced pediatric neuroradiologist and classified as true positive, false positive, or false negative. Clinical semiology and EEG findings were additionally evaluated for cases with false-positive detections. At the lesion level, MELD Graph achieved a precision of 0.85 and recall of 0.52, while 3D-nnUNet achieved a precision of 0.91 and recall of 0.48. In the MRI-negative patients, MELD Graph produced more false-positive detections than 3D-nnUNet (0.53 vs. 0.14 false-positive lesions per patient). At the patient level, MELD Graph showed slightly higher sensitivity than 3D-nnUNet (0.63 vs. 0.54), whereas 3D-nnUNet demonstrated markedly higher specificity (0.86 vs. 0.56). Improved FLAIR image quality was associated with trends toward improved model performance. Both models demonstrated high precision but moderate sensitivity, indicating that they are valuable decision-support tools but cannot replace expert neuroradiological evaluation. Optimized MRI acquisition protocols are needed to further improve automated lesion detection in pediatric epilepsy.
Yang, s.; Zhong, Y.; yang, b.
Show abstract
Introduction Cervical spondylotic myelopathy (CSM) surgery is frequently associated with residual neurological deficits, partly due to unrecognized dynamic spinal cord compression on conventional MRI. Current static imaging may miss position-dependent stenosis, resulting in insufficient or inappropriate decompression. This study aims to evaluate whether dynamic MRI-guided individualized surgery improves neurological outcomes compared to conventional MRI-based planning. Objectives This study aims to examine the association between dynamic MRI-guided surgical planning and neurological recovery in cervical spondylotic myelopathy, and to evaluate its role in identifying responsible segments, avoiding excessive surgery, and improving clinical outcomes. Methods This single-center retrospective cohort study will include 300 patients who underwent cervical spine surgery between January 2020 and December 2025 at the First Affiliated Hospital of Guangxi University of Chinese Medicine. Patients will be categorized into the dynamic MRI-guided group (n=150) or conventional MRI-based group (n=150) based on preoperative imaging modality. 1:1 propensity score matching will be performed using age, sex, BMI, disease duration, baseline mJOA score, and number of compressed segments. The primary outcome is the rate of improvement in the mJOA score at 6 months postoperatively. Secondary outcomes include VAS, NDI, reoperation rate, and time to first complication. Between-group comparisons will use t-tests/Mann-Whitney U tests for continuous variables, {chi}{superscript 2} tests/Fisher's exact tests for categorical variables, and Kaplan-Meier estimates with the log-rank test for time-to-event outcomes. A two-sided P<0.05 will be considered significant. Analyses will be performed using R software (version 4.4.1). Ethical approval was obtained from the Medical Ethics Committee of the First Affiliated Hospital of Guangxi University of Chinese Medicine (Approval No. 2025-080-KY-01) from February 06, 2026 to February 05, 2027. Expected outcomes We hypothesize that dynamic MRI-guided surgical planning will improve neurological recovery and decompression accuracy in cervical spondylotic myelopathy, providing evidence for optimized preoperative imaging and precision spine surgery.
Namvar, A.; Shan, B.; Hoff, B.; Labaki, W. W.; Murray, S.; Bell, A. J.; Galban, S.; Kazerooni, E. A.; Martinez, F. J.; Hatt, C. R.; Han, M. K.; Galban, C. J.; Ram, S.
Show abstract
Purpose: To develop an interpretable feature-based Deep Parametric Response Mapping (PRMD) method that combines wavelet scattering convolution networks and machine learning to spatially detect and quantify functional small airways disease (fSAD) and emphysema on paired inspiratory-expiratory CT scans, with enhanced noise robustness. Materials and Methods: In this retrospective analysis of prospectively acquired data (2007-2017), we developed and validated a deep learning-based PRM approach using paired CT scans from 8,972 tobacco-exposed COPDGene participants ([≥]10 pack-years; mean age 60.1 {+/-} 8.8 years; 46.5% women), including controls with normal spirometry (n = 3,872; controls), PRISm (n = 1,089), GOLD 1-4 COPD (n = 4,011). Data were stratified into training, validation, and testing sets (24:6:70). PRMD extracts translation-invariant image features using a wavelet scattering network and applies a subspace learning classifier to classify voxels as emphysema or non-emphysematous air trapping (fSAD). PRMD was compared with conventional density-based PRM for voxel-wise agreement, correlation with pulmonary function, robustness to noise, and sensitivity to misregistration using Pearson correlation, Bland-Altman analysis, and paired t tests. Results: PRMD achieved 95% voxel-wise agreement with standard PRM (r = 0.98) while demonstrating significantly greater robustness under noise. PRMD showed stronger correlations with FEV1; (emphysema: r = - 0.54; fSAD: r = - 0.51; P < 0.0001) than standard PRM (r = - 0.42 for both; P < 0.0001). Under simulated high-noise conditions, standard PRM overestimated disease by ~15%, whereas PRMD limited error to < 5% (P < 0.001). Conclusion: PRMD provides an interpretable, feature-driven and noise-resilient alternative to traditional PRM for emphysema and fSAD classification, enhancing the reliability of CT-based COPD phenotyping for multi-center studies and low-dose imaging applications.
Auger, S. D.; Varley, J.; Hargovan, M.; Scott, G.
Show abstract
Background: Current medical large language model (LLM) evaluations largely rely on small collections of cases, whereas rigorous safety testing requires large-scale, diverse, and complex cases with verifiable ground truth. Multiple Sclerosis (MS) provides an ideal evaluation model, with validated diagnostic criteria and numerous paraclinical tests informing differential diagnosis, investigation, and management. Methods: We generated synthetic MS cases with ground-truth labels for diagnosis, localisation, and management. Four frontier LLMs (Gemini 3 Pro/Flash, GPT 5.2/5 mini) were instructed to analyse cases to provide anatomical localisation, differential diagnoses, investigations, and management plans. An automated evaluator compared these outputs to the ground-truth labels. Blinded subspecialty experts validated 70 cases for realism and automated evaluator accuracy. We then evaluated LLM decision-making across 1,000 cases and scaled to 10,000 to characterise rare, catastrophic failures. Results: Subspecialist expert review confirmed 100% synthetic case realism and 99.8% (95% CI 95.5 to 100) automated evaluation accuracy. Across 1,000 generated MS cases, all LLMs successfully included MS in the differential diagnoses for more than 91% cases. However, diagnostic competence did not associate with treatment safety. Gemini 3 models had low rates of clinically appropriate steroid recommendations (Flash: 7.2% 95% CI 5.6 to 8.8; Pro: 15.8% 95% CI 13.6 to 18.1) compared to GPT 5 mini (23.5% 95% CI 20.8 to 26.1), frequently overlooking contraindications like active infection. OpenAI models inappropriately recommended acute intravenous thrombolysis for MS cases (9.6% GPT 5.2; 6.4% GPT 5 mini) compared to below 1% for Gemini models. Expanded evaluation (to 10,000 cases) probed these errors in detail. Thrombolysis was recommended in 10.1% of cases lacking symptom timing information and paradoxically persisted (2.9%) even when symptoms were explicitly documented as more than 14 days old. Conclusion: Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing. Massive-scale simulation and automated interrogation should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.
Coelho, J. A. P. d. M.; Nascimento da Paixao, A.; Guimaraes Almeida, B.; Näslund-Hadley, E.
Show abstract
Background Childhood sensory and intellectual disabilities represent significant yet under-recognized barriers to learning and human capital development. This study analyzes prevalence and severity of these conditions among 149.3 million children aged 5-19 years across 25 countries in Latin America and the Caribbean (LAC) using Global Burden of Disease 2023 data. Methods We extracted GBD 2023 estimates for vision loss, hearing loss, and intellectual disability across 25 LAC countries, stratified by age, sex, and severity. Regional estimates were calculated using population-weighted averages. Severity distributions were compared with OECD countries to contextualize regional patterns. Results: These conditions are estimated to affected 9,282,921 children (6.22%; 95% UI: 5.89-6.54%). Hearing loss was predominant, affecting an estimated 5.42 million (3.63%, 3.41-3.86), with 87.6% mild-to-moderate. Intellectual disability estimated to affected 2.56 million (1.71%, 1.58-1.85), with 61.7% borderline-to-mild. Vision loss estimated to affected 1.30 million (0.87%, 0.79-0.96), with 89% that can be effectively addressed with spectacles. Prevalence increased with age across all conditions. Male predominance was consistent for intellectual disability (2.00% vs 1.42%). Annual economic cost totaled US$19.3-29.0 billion, while comprehensive interventions would require US$9.45-14.23 billion with benefit-cost ratios of 2:1 to 15:1. Conclusions The distribution of children across milder levels of difficulty underscores the opportunity for education and public health systems to provide timely and accessible support. With approximately 88% of sensory impairments addressable through established technologies, investments in inclusive services can yield strong social and economic returns.
Dolin, P.; Keogh, K. A.; Rowell, J.; Edmonds, C.; Kielar, D.; Meyers, J.; Esterberg, E.; Nham, T.; Chen, S. Y.
Show abstract
Purpose: We evaluated healthcare resource utilization (HCRU) and costs in patients with eosinophilic granulomatosis with polyangiitis (EGPA). Methods: Patients with newly diagnosed EGPA (2017--2021), [≥]12 months' pre-diagnosis health plan enrollment, and [≥]1 inpatient or [≥]2 outpatient claims with an EGPA diagnosis were included. Follow-up was from EGPA diagnosis until disenrollment or database end. HCRU and health insurer payment costs during follow-up were compared with those for matched cohorts of general insured patients without EGPA (comparison A) and without EGPA but with severe uncontrolled asthma (SUA; comparison B). Results: In comparison A, all-cause HCRU was higher in the EGPA cohort (n = 213) versus matched patients (n = 779) for all clinical encounters/pharmacy claim types; annualized, mean total all-cause costs were 16-fold higher ($117,563/patient) versus matched patients ($7,520/patient). In comparison B, all-cause HCRU was higher for the EGPA cohort (n = 182) versus the matched SUA cohort (n = 640) for all clinical encounters/pharmacy claim types, with 5-fold higher mean total all-cause costs ($118,127/patient vs $22,286/patient). In both EGPA cohorts, HCRU and associated costs increased between the baseline and follow-up periods. Conclusions: These findings highlight the need for more effective treatments to reduce the clinical and economic burden of EGPA.
Khan, M. H.; Chakraborty, S.; Marin-Pardo, O.; Barisano, G.; Borich, M. R.; Cole, J. H.; Cramer, S. C.; Fokas, E. E.; Fullmer, N. H.; Hayes, L.; Kim, H.; Kumar, A.; Rosario, E. R.; Schambra, H. M.; Schweighofer, N.; Taga, M.; Winstein, C.; Liew, S.-L.
Show abstract
Post-stroke cognitive recovery is difficult to predict using focal lesion characteristics alone. The brain's capacity to maintain cognitive function depends also on structural integrity of the whole brain. One way to measure brain health is through the severity of cerebral small vessel disease (CSVD) markers, which reflect aging-related pathologies that erode structural integrity. Here, we propose a composite measure of CSVD (cCSVD) integrating three independently validated biomarkers automatically quantified using T1-weighted MRIs: white matter hyperintensity volume (WMH; representing vascular injury), perivascular space count (PVS; putative glymphatic clearance), and brain-predicted age difference (brain-PAD; structural atrophy). We hypothesize that cCSVD, which captures the shared variance across these CSVD biomarkers, will be a robust indicator of whole-brain structural integrity and predict cognitive changes 3 months after stroke. We analyzed 65 early subacute stroke survivors with assessments within 21 days (baseline) and at 90 days (follow-up) post-stroke. WMH volume, PVS count, and brain-PAD were quantified from baseline T1-weighted MRIs, and then residualized for age, sex, days since stroke, and intracranial volume. Principal component analysis (PCA) of the residualized biomarkers was used to derive cCSVD. Beta regression with stability selection using LASSO was used to model three outcomes: baseline Montreal Cognitive Assessment (MoCA) scores, follow-up MoCA scores, and longitudinal change (follow-up score adjusted for baseline score). Logistic regression was used to test if baseline cCSVD predicted improvement in those with baseline cognitive impairment (MoCA < 26). The PCA revealed that the first principal component (PC1) explained 43.1% of the total variance among WMH volume, PVS count, and brain-PAD. The three biomarkers contributed nearly equally to PC1, which was subsequently used as the baseline cCSVD score. Lower baseline cCSVD was significantly associated with better MoCA scores at follow-up ({beta} = -0.19, p = 0.009), even after adjusting for baseline MoCA ({beta} = -0.12, p = 0.042), and, importantly, outperformed all individual biomarkers. Furthermore, lower cCSVD at baseline significantly increased the likelihood of improving to cognitively unimpaired status at three months (OR = 0.34, p = 0.036), independent of age and education. The composite CSVD captures the additive impact of vascular injury, glymphatic dysfunction, and structural atrophy on recovery in a way that individual measures do not. cCSVD accounts for shared variance across these domains, reflecting a patient's latent capacity for cognitive recovery, where relative integrity in one CSVD domain may mitigate effects of another. This automated, T1-based framework offers a scalable tool for predicting post-stroke recovery.
Shireman, J.; Mukherjee, N.; Brackman, K.; Kurtz, N.; Patniak, A.; McCarthy, L.; Gonugunta, N.; Ammanuel, S.; Dey, M.
Show abstract
Objectives: Academic medical institutions are the gatekeepers of the physician workforce and shape the future of medicine by regulating medical school admissions as well as residency training. Although broadly the field of medicine is seeing more representation from traditionally underrepresented groups, the critical decision-making platform of academic medicine continues to be uncharacteristically homogeneous, represented mainly by white males. This is even more pronounced in surgical subspecialties, such as academic neurosurgery. This study aims to quantify this phenomenon, uncover its driving factors, and define opportunities for improvement. Methods: Using a mixed research methodology, academic neurosurgical faculty in the U.S were identified, and their demographic data was collected. An internet search using Google Scholar and Scopus was conducted to determine scholarly activity using number of publications and h-index. Results: We found a significant increase in female faculty in academic neurosurgery within the last decade. Comparing the faculty rank amongst male and female faculty, we found that the majority of female faculty are at the assistant professor level (n=36/79; 45.6%) while male faculty are more at the full professor rank (n=265/582; 45.5%). A similar trend was seen for under-represented minority neurosurgery faculty. Strong scholarly activity corelated with a departmental chair position for male faculty, however, this trend was not true for female faculty. There was a significant difference in the number of publications and h-index in female vs male faculty, but only when including male faculty outliers at the full professor level. Conclusion: Slowly but steadily, academic neurosurgery is making progress towards a more diverse and representative workforce in the U.S that better reflects the patient population. Facilitating timely progression of females and URM neurosurgeons into senior professorship and academic leadership roles will further advance this essential progress.
Basharat, A.; Hamza, O.; Rana, P.; Odonkor, C. A.; Chow, R.
Show abstract
Introduction Large language models are increasingly being used in healthcare. In interventional pain medicine, clinical reasoning is essential for procedural planning. Prior studies show that simplified prompts reduce clinical detail in AI-generated responses. It remains unclear whether this reflects knowledge loss or simply prompt-driven suppression of information. Methods We performed a controlled comparative study using 15 standardized low back pain questions representing common interventional pain questions. Each question was submitted to ChatGPT under three conditions, professional-level prompt (DP), fourth-grade reading-level prompt (D4), and clinician-directed rewriting of the D4 response to a medical level (U4[->]MD). No follow-up prompting was allowed. Three physicians independently rated responses for accuracy using a 0-2 ordinal scale. Clinical completeness was determined by consensus. Word count and Flesch-Kincaid Grade Level (FKGL) were also measured. Paired t-tests compared conditions. Results Accuracy was highest with professional prompting (1.76). Accuracy declined with the fourth-grade prompt (1.33; p = 0.00086). When simplified responses were rewritten for clinicians, accuracy returned to baseline (1.76; p {approx} 1.00 vs DP). Clinical completeness followed the same pattern showing DP 80.0%, D4 6.7%, U4[->]MD 73.3%. Fourth-grade responses were shorter and less complex. Upscaled responses were more complex and similar in length to professional responses. Inter-rater reliability was low (Fleiss {kappa} = 0.17), but trends were consistent across conditions. Conclusions Reduced clinical detail under simplified prompts appears to reflect constrained output rather than loss of knowledge. Clinician-directed reframing restores omitted content. LLM performance in interventional pain depends strongly on prompt design and intended audience.
Pavlidis, D. I.; Fischer, C. E.; Jennings, M. A.; Machlin, J. H.; Jan, V.; Baker, B. M.; Shikanov, A.
Show abstract
Research questionCan tissue clearing, combined with volumetric imaging, enable reliable, quantitative three-dimensional analysis of follicles and vasculature in intact human ovarian tissue? DesignA CUBIC-based clearing protocol was adapted for human ovarian medulla and cryopreserved cortex. Tissue from reproductive-aged donors was cleared, fluorescently labeled, and imaged using confocal and light sheet microscopy. Tissue expansion, imaging depth, and vascular morphometrics were quantified and follicle density was compared to conventional histology. ResultsClearing produced optically transparent tissue with a linear expansion factor of 1.2 across cortex and medulla. Imaging depth increased 6.5-11-fold in cortex and 6-8-fold in medulla. Follicle density measurements in immunolabeled cleared cortex were comparable to histology, supporting the validity of volumetric follicle quantification. Light sheet microscopy of lectin-labeled cortex revealed no significant donor-to-donor differences in vascular morphometrics, including mean vessel diameters of 12-14 {micro}m, branch point densities of 632-965 points/mm3, vessel length densities of 117-175 mm/mm3, and volume fractions of 1.9-2.3%. Volumetric imaging further illustrated heterogeneous spatial relationships between follicles and surrounding vessels. ConclusionTissue clearing and volumetric imaging complement routine histology and enable quantitative three-dimensional investigation of follicle-vascular interactions in intact human ovarian tissue, providing a framework for advancing fertility preservation and ovarian tissue transplantation research.
Vattipally, V. N.; Jillala, R. R.; Kramer, P.; Elshareif, M.; Singh, S.; Jo, J.; Suarez, J. I.; Sakran, J. V.; Haut, E. R.; Huang, J.; Bettegowda, C.; Azad, T. D.
Show abstract
Background: Prognostication after moderate-to-severe traumatic brain injury (TBI) rarely captures long-term functional recovery, despite its importance to patients, families, and clinicians. Large trauma registries such as the Trauma Quality Improvement Program (TQIP) dataset contain detailed clinical data but lack systematic follow-up, limiting their ability to study longer-term functional outcomes. Methods: We developed and externally validated a machine learning model to predict favorable six-month functional outcome (GOS MD/GR or GOSE >=5) using harmonized data from two randomized clinical trials: CRASH (training) and ROC-TBI (validation). Five candidate classifiers (random forest [RF], linear discriminant analysis, k-nearest neighbors, naive Bayes, and support vector machine) were trained using seven shared clinical predictors. Models were evaluated using ROC-AUC, calibration metrics, and performance at the Youden optimal threshold and a high-sensitivity secondary threshold. The final model was applied to patients with moderate-to-severe TBI in the national TQIP registry (2017-2022) to estimate population-level recovery patterns. Results: The RF model demonstrated the highest overall performance after recalibration, achieving strong discrimination (AUC internal and external, 0.887 and 0.784), good calibration, and high sensitivity (0.890) and negative predictive value (0.909). Applied to 63,289 patients from TQIP, the model estimated that 45% would achieve favorable six-month outcomes at the Youden optimal threshold and 57% at the high-sensitivity threshold, with predicted recovery aligning with established clinical correlates such as younger age, higher admission GCS, and lower rates of penetrating or brainstem injuries. Conclusion: A machine learning model trained on high-quality trial data can generate clinically plausible estimates of long-term functional recovery when applied at scale to national trauma registries that lack systematic follow-up. This approach enables imputation of functional outcomes in datasets lacking follow-up, supports benchmarking and quality improvement across trauma systems, and provides a foundation for future models incorporating physiologic time-series, imaging, and biomarker data.
Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.
Show abstract
Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.
Koulidiati, J.-L.; Zoma, R. L.; Nebie, E. I.; Soumaila, Y.; Neya, C. O.; Kiendrebeogo, J. A.; Debellut, F.
Show abstract
Background: In Burkina Faso, typhoid fever remains a major public health concern, with a high incidence among children younger than 15 years of age. To address this burden, the country introduced typhoid conjugate vaccine in January 2025 through a national vaccination campaign reaching children aged 9 months to 14 years. This study aimed to estimate the cost of typhoid conjugate vaccine delivery during the national campaign and to identify the main cost drivers across different administrative levels. Methods: We conducted a cross-sectional, retrospective costing study using a microcosting approach from the government perspective. We collected data from fifty health facilities, eight health districts, five health regions, and the national level. Financial and economic costs were estimated for each level, excluding vaccine and syringe costs. All costs were converted to 2024 USD using the official exchange rate. Findings: Vaccinators administered a total of 10.5 million typhoid conjugate vaccine doses. The average financial cost per dose was $0.47 (95% CI: $0.39-$0.51), and the economic cost was $2.16 (95% CI: $1.71-$2.56). Human resources and per diem payments were the main contributors to costs. Costs varied by geography, delivery strategy, and security context, with higher costs observed in rural and conflict-affected areas. The mobile-temporary posts strategy had the highest economic cost per dose ($2.02; 95% CI: $1.64-$2.40), while the fixed strategy had the highest financial cost per dose ($0.41; 95% CI: ($0.32-$0.49). Conclusion: The financial cost per dose remained within Gavi, the Vaccine Alliance's operational support range. The observed cost variations highlight the need for targeted funding and enhanced logistical support to ensure equitable access, particularly in rural and insecure areas. This study provides evidence to inform future vaccination campaigns and supports decision-making for typhoid conjugate vaccine introduction in other countries in the region.
Bhalerao, G. V.; Markiewicz, P.; Turnbull, J.; Thomas, D. L.; De Vita, E.; Parkes, L.; Thompson, G.; MacKewn, J.; Krokos, G.; Wimberley, C.; Hallett, W.; Su, L.; Malhotra, P.; Hoggard, N.; Taylor, J.-P.; Brooks, D.; Ritchie, C.; Wardlaw, J.; Matthews, P.; Aigbirho, F.; O'Brien, J.; Hammers, A.; Herholz, K.; Barkhof, F.; Miller, K.; Matthews, J.; Smith, S.; Griffanti, L.
Show abstract
Harmonisation is widely used to mitigate site- and scanner-related batch variability in multisite neuroimaging studies and is particularly critical in longitudinal clinical trials, where detection of subtle biological or treatment-related changes depends on reliable measurement across scanners and timepoints. However, the effectiveness of harmonisation in small, heterogeneous clinical datasets remains insufficiently understood, particularly in relation to subject-level variability and consistency across acquisition settings, and its impact on both removal of technical variability and preservation of biological variation in pooled multisite analyses. We systematically evaluated a range of image-based and statistical harmonisation methods using a clinically realistic multisite, multiscanner structural T1-weighted (T1w) MRI test-retest dataset comprising three controlled acquisition scenarios: repeatability, intra-scanner reproducibility and inter-scanner reproducibility. Methods were applied under different batch specifications (site, scanner, or both) and performance was assessed within each scenario and in pooled data using a multi-metric framework capturing both technical and biological variability in volumetric imaging-derived phenotypes (IDPs) relevant to aging and dementia research. Across IDPs, before harmonisation variability was lowest in the repeatability scenario (median variability=0.6 to 2.7%, rank consistency {rho} [≥]0.9), with modest increases under intra-scanner reproducibility (0.5 to 3.2%, {rho}=0.5 to 1.0) and substantially greater variability under inter-scanner reproducibility conditions (1.7 to 19.2%, {rho} =-0.1 to 0.9). These results offer important information to consider for multisite study design, including sample size calculation in clinical trials. Harmonisation performance was strongly context dependent, with clearer benefits emerged in inter-scanner scenarios where both variability reduction and improvements in subject-level consistency were observed. In pooled data, approaches that explicitly modelled site as batch and accounted for repeated-measure structure showed greater consistency across IDPs in batch effect mitigation and more accurately reflected underlying biological variation. Our evaluation metrics enabled disentangling the removal of global batch effect while highlighting residual variability at the phenotype-specific or multivariate levels. These findings demonstrate that harmonisation cannot be treated as a one-size-fits-all solution and must be interpreted relative to the acquisition context, dataset structure, and downstream analytic goals. Multi-metric evaluation under realistic clinical constraints is essential to support reliable and translatable neuroimaging inference by ensuring appropriate correction of batch effects while preserving longitudinal biological signals and sensitivity to clinically meaningful change in multisite studies.
Chen, Y.; Law, Z. K.; Zhou, X.; Dai, Q.; Xiang, S.; Xiao, X.; Ma, J.; Feng, M.; Peng, W.; Zhou, S.; Chen, L.; Zhou, Y.; Lai, Y.; Yeo, L.; An, S.; He, Y.; Pan, S.-Y.
Show abstract
Abstract Objective: To compare the safety and efficacy of bridging intravenous thrombolysis (IVT) plus endovascular thrombectomy (EVT) versus direct EVT in patients with acute ischemic stroke (AIS) due to anterior circulation large vessel occlusion (LVO) treated within the 6- to 24-hour time window. Methods: This is a retrospective analysis of prospective EVT registry from 10 comprehensive stroke centers in China and Singapore between 2019 and 2024. Eligible patients had anterior circulation LVO, underwent EVT within 6-24 hours of onset, had ASPECTS 6, NIHSS 6, and pre-stroke mRS 2. Patients were stratified into bridging IVT + EVT (IVT group) versus direct EVT alone (non-IVT group). Propensity score matching (1:2 ratio) was performed to balance baseline covariates. The primary outcome was 3-month favorable functional outcome (mRS 0-2). Secondary outcomes included successful recanalization (mTICI 2b-3), symptomatic intracranial hemorrhage (sICH), hemorrhagic transformation (HT) and 3-month mortality. In the matched cohort, binary outcomes were compared using the Cochran-Mantel-Haenszel test. Results: Of 772 included patients, 110 (14.2%) received bridging IVT and 662 (85.8%) received direct EVT. After propensity score matching, 202 non-IVT patients were matched to 101 IVT patients, with all covariates well-balanced (absolute SMD <0.10). In the matched cohort, bridging IVT was not associated with a significant difference in 3-month favorable outcome (44.55% vs. 47.03%; common OR 0.91; 95% CI 0.56-1.46), successful recanalization (91.09% vs. 90.10%; OR 1.11; 0.51-2.44), sICH (5.94% vs. 9.41%; OR 0.61; 0.24-1.58), HT (23.76% vs. 23.27%; OR 1.03; 0.57-1.85), or 3-month mortality (15.84% vs. 13.37%; OR 1.22; 0.62-2.37). Conclusion: In this large multicenter propensity score-matched analysis, bridging intravenous thrombolysis before endovascular thrombectomy in the 6- to 24-hour time window was not significantly associated with improved efficacy or increased safety risks compared with direct endovascular therapy alone.
Yu, X.; Yan, R.; Li, H.; Xie, Y.; Bi, M.; Li, Y.; Roccuzzo, A.; Tonetti, M. S.
Show abstract
Aim: To comprehensively characterize the salivary proteome in periodontitis using Orbitrap Astral data-independent acquisition mass spectrometry (DIA-MS), identify an atlas of differentially expressed proteins (DEPs), and develop a machine learning-derived multi-protein biomarker panel for non-invasive diagnosis of stage III/IV periodontitis. Materials and Methods: Unstimulated saliva samples from 199 participants (periodontal health/gingivitis, n=120; stage III/IV periodontitis, n=79) were analyzed by Orbitrap Astral DIA-MS. DEPs were identified, and pathway enrichment analysis was performed. A two-tier machine learning pipeline, integrating pathway-based feature selection with cross-validated evaluation, was applied to identify the optimal diagnostic panel. Results: Orbitrap Astral DIA-MS quantified 5,597 salivary proteins and 1,966 DEPs (|log2FC|>0.5, FDR<0.05). Pathway analysis identified 14 periodontitis-relevant KEGG pathways, including Th17 cell differentiation, IL-17 signaling, neutrophil extracellular trap formation, and complement and coagulation cascades. A four-protein panel (TEC, RAC1, MAPK14, KRT17) achieved an area under the curve (AUC) of 0.985 plus-or-minus sign 0.010, with 83% sensitivity and 100% specificity. The panel was corroborated using public datasets. Conclusions: To our knowledge, this study represents the first application of Orbitrap Astral DIA mass spectrometry in periodontitis research, establishing a disease-specific DEPs atlas and a salivary biomarker panel with high diagnostic accuracy for stage III/IV periodontitis, providing a foundation for future external validation studies.